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Abstract 

Quantum circuits currently consti tute a dominant model for quantum computation pi4V Our work addresses 
tlie problem of constructing quantum circuits to implement an arbitrary given quantum computation, in tlie 
special case of two qubits. We pursue circuits witliout ancilla qubits and as small a number of elementary 
quantum gates (T]|15I as possible. Our lower bound for worst-case optimal two-qubit circuits calls for at least 17 
gates: 15 one-qubit rotations and 2 CNOTs. To this end, we constructively prové a worst-case upper bound of 23 
elementary gates, of which at most 4 (CNOTs) entail multi-qubit interactions. Our analysis shows that synthesis 
algorithms suggested in previous work, although more general, entail much larger quantum circuits than ours in 
the special case of two qubits. One such algorithm j5) has a worst case of 61 gates of which 18 may be CNOTs. 

Our techniques rely on the KAK decomposition from Lie theory as well as the polar and spectral (symmetric 
Shur) matrix decompositions from numerical analysis and operator theory. They are related to the canonical 
decomposition of a two-qubit gate with respect to the "màgic basis" of phase-shifted Bell states 11211131 . We 
further extend this decomposition in terms of elementary gates for quantum computation. 
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1 Introduction 



Quantum computations can be described by unitary matrices 1141 . In order to effect a quantum computation on 
a quantum computer, one must decompose such a matrix into a quantum circuit, which consists of elementary 
quantum gates 1 1 1 connected by Kronecker (tensor) and matrix products. Those connections are often represented 
using quantum circuit schematics. In some cases circuit decompositions require temporarily increasing the dimen- 
sion of the underlying Hilbert space, which is represented by "temporary storage lines". Since there is always a 
multitude of vàlid circuit decompositions, one typically prefers those with fewer gates. 

Algorithms for classical lògic circuit synthesis 1 8 1 read a Boolean function and output a circuit that implements 
the function using gates from a given gate library. By analogy, we can talk about quantum circuit synthesis. In this 
work we only discuss purely classical algorithms for such synthesis problems. Even at this early stage of quantum 
computing, it seems clear that algorithms for circuit synthesis are going to be as important in quantum computing 
as they are in classical Electronic Design Automation, where commercial circuit synthesis tools are necessary for 
the design of cellular phones, game consoles and networking chips. 

If a Boolean function is given by its truth table, then a two-level circuit, linear in the size of the truth table, 
can be constructed immediately. Thus, it is the optimization of the circuit structure that makes classical circuit 
synthesis interesting. Given a unitary matrix, it is not nearly as easy to find a quantum circuit that implements it. 
Genèric algorithms for this problem are known 1161 l5l. but in some cases produce very large circuits even when 
small circuits are possible. We hope that additional optimizations are possible. Importantly, the work in |16| 
suggests that genèric circuit decompositions can be found by means of solving a series of specialized synthesis 
problems, e.g., the synthesis of circuits consisting of NOT, CNOT and TOFFOLI gates as well as phase-shift 
circuits. Such specialized synthesis problems are addressed by other researchers fT|ll5lfT7l . 

A recent work 1 12 1 on time-optimal control of spin systems presents a holistic view of circuit-related optimiza- 
tions, which is based on the Lie group theory. However, their approach is not as detailed as previously published 
circuit synthesis algorithms, and comparisons in terms of gate counts are not straightforward. 

Our work can be compared to the GQC "quantum compiler" |7 3| available online.' That program inputs 
a 4 X 4 unitary U and returns a "canonical decomposition" which is not, in a strict sense, a circuit in terms of 
elementary gates. It also returns a circuit that computes CNOT using U and one-qubit gates. When U is used only 
once, this easily yields a circuit decomposition of U in terms of elementary gates. However, it appears that not all 
input matrices can be processed successfully.^ 

Our work pursues genèric circuit decompositions Q] 13 of two-qubit quantum computations up to global 
phase. While some authors consider arbitrary one-qubit gates elementary, we recali that they can be decomposed, 
up to phase, into a product of one-parametric rotations according to Equation|3l Therefore we only view the 
necessary one-parametric rotations as elementary. Some of our results (constructive upper bounds) in terms of 
such elementary gates can be reformulated in terms of coarser elementary gates. We also observe that the Standard 
choice of elementary lògic gates in classical computing (AND-OR-NOT) was suggested in the XIX''' century by 
Boole for abstract reasons rather than based on specific technologies. Today the AND gate is by far not the simplest 
to implement in CMOS-based integrated circuits. This fact is addressed by commercial circuit synthesis tools by 
decoupling library-less lògic synthesis from technology-mapping |8|. The former uses an abstract gate library, 
such as AND-OR-NOT and emphasizes the scalability of synthesis algorithms that capture the global structure of 
the given computation. The latter step maps lògic circuits to a technology-specific gate library, often supplied 
by a semiconductor manufacturer, and is based on local optimizations. Technology-specific libraries may contain 
composite multi-input gates with optimized layouts such as the AOI gate (AND/OR/ INVERTER). 

To this end, our algorithms are analogous to library-less lògic synthesis. 

'We point out that the term "compiler" in classical computing means "translator from a high-level description to a register-transfer level 
(RTL) description, e.g., machine codes". The task of producing circuits with given function is commonly referred to as "circuit synthesis". In 
this context, digital circuits are called "lògic circuits". 

/ 

^As of March 2003, the quantum compiler fails on exp i 
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the problem lies in the code rather than the method. 
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Gate library. We consider the following library of elementary one- and two-qubit gates: 



' \^ — sin9/2 cos9/2 j 

( e"'"/2 \ 

• ^^(«) = ( Q g,a/2 j for all < a < 2jt; 

• The CNOT gate, conditioned on either line. 

A given gate may, in principle, be applied to different lines. We do not restrict to which lines the above 
gates may be applied. Note that the gate library we use generates í/(4) up to global phase |5|. In order to find 
gate decompositions, we use the Lie-group techniques from 1 12^. The resulting procedure is often superior to 
previously published genèric algorithms líTól lSl in terms of the size of synthesized circuits. 

Theorem 1.1 Up to global phase, any two qubit computation may be realized exactly by at most twenty-three 
elementary gates, of which at most four are CNOTi. No ancilla qubits are required. 

We do not know whether this result is optimal, but show that at least seventeen elementary gates are required. 

The remaining part of the paper is organized as follows. Section|2]covers the necessary background on quan- 
tum circuits and elementary gates for quantum computation 1 1 1. Relevant matrix decompositions and prior work 
on circuit synthesis are described in Section|3] including a related algorithm to decompose unitary matrices into 
elementary gates |5|. Section|4]introduces the "màgic basis" from 1 13 1, as well as the associated entangler and 
disentangler gates. In Section|5] we present a genèric decomposition of an arbitrary two-qubit quantum com- 
putation into 23 elementary gates or less using the KAK decomposition from Lie theory. We also give several 
examples. Lower bounds are discussed in Section|6l followed by conclusions and ongoing work in Section0 



2 Notation and Background 

GL(2*^) = {M e (2*^ X 2'')-matrices| det(M) ^ 0}. For M e GL{2"), we consider its adjoint matrix M*, produced 
from the transpose M' by conjugating each matrix element. M is called Hennitian (synonym: self-adjoint) iff 
M = M* . Hermitian matrices generalize symmetric real-valued matrices. 

Quantum states and quantum circuits are governed by the laws of quantum mechanics: fc-qubit states are 2*^- 
dimensional vectors, i.e., complex linear combinations of 0-1 bit-strings of length k. A quantum computation 
acting on k qubits (k inputs and k outputs) is modelled by a unitary 2* x 2'^-matrix 1141 . We denote such matrices 
by í/(2*^) = {M e (2^^ X 2'^)-matrices|MM* = 1}. 0(2'') represents those matrices from U{2'') with real entries. 
SU{2'') and 50(2*^) are the respective subsets with determinant one. Below, we will consider two genèric elements 
of 5í/(2): A = a£ii + (-p)£i2 + p£2i+a£22 and fi = y^ii + (-8)£i2 + 5^21 +7^22 with 1 = |a|2 + |p|2 = 
IyP + |5p. Such a parameterization of SU (2) can be verified directly. 

We largely ignore the effects of quantum measurement that is typically performed after a quantum circuit is 
applied, but we use the fact that any measurement is invariant under a global phase change. In mathematical terms, 
this means that any computation in U{2'') can be represented in normalized form by a matrix from SU{2^). 

2.1 Quantum circuits and elementary gates for quantum computation 

In our work, we only discuss combinational quantum circuits, which are directed acyclic graphs where every 
vèrtex represents a gate. An output of a gate can be connected to exactly one input of another gate or one circuit 
output. A similar restriction applies to gate inputs (see examples of quantum circuits in Figures^andlJ)- 

Following 1 1 15|, we attempt to express arbitrary computations using as small numbers of elementary gates 
as possible. In order to write matrix elements of particular gates, we order the elements of the computational 
basis lexicographically |14|. The computation implemented by several gates acting independently on different 
qubits can be described by the Kronecker (tensor) product (g) of their matrices. In the usual computational basis 
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|00), |01),|10),|11) ordered in the dictionary order, the matrix in í/(4) representing A(E)B (for A and B defined 
above) will then be 

aB pfi 
Pfi aB 



{A®B) : 



(1) 



Composition of múltiple quantum computations is described by the matrix product. However, as most circuit 
diagrams are read left-to-right, the order in respective matrix expressions is reversed. For example, the expression 
(A (g) fi) (C D) corresponds to a two-qubit circuit where C acts on the top line and D on the bottom line, followed 
by A acting on the top line and B on the bottom line. Since the two Unes do not interact, the same computation is 
performed by AC acting on the top line and BD acting on the bottom line independently, i.e., {A^B){C(E)D) = 
(AC^BD). Sometimes this identity allows one to simplify quantum circuits and reduce their gate counts. 

We distinguish two versions of the CNOT gate, topCNOT and botCNOT conditioned on the top and bottom 
lines respectively: (i) bot CNOT exchanges |01) ^ |11), i.e. CNOT controlledby the top line, and (ii) topCNOT 
exchanges 1 10) 1 11). Those gates can be represented by matrices; 



topCNDT: 



/ 1 \ 

10 

1 

\ 1 y 



botCNOT : 



/ 1 \ 

1 

10 

\ 1 / 



(2) 



An arbitrary one-qubit quantum computation can be implemented, up to phase, by three elementary gates. 
This is due to [j^ Lemma 4. 1], which decomposes an arbitrary 2x2 unitary into 



U 

















e'«/2 



cos9/2 sin9/2 
— sin9/2 cos9/2 



e-'P/2 
e'P/2 



(3) 



To recover the non-5 parameters, we divide U by its determinant. The resulting matrix Ü has 5 = 0, and 

Ü'l : : ]ü = [ ^ T" rr.\ i (4) 



cos 9 
e'Psin9 



We routinely ignore global phase because it does not affect the result of quantum measurement, which is the last 
step in quantum algorithms. A particular one-qubit computation, the Hadamard gate H, can be implemented, up 
to global phase, using two elementary gates as follows: 



H 



V2fl 1 
2 l 1 -1 



V2f -i 
2 l -i 



i 

-i 



1 1 

-1 1 



Similarly, the NOT gate (also known as Pauli-X) requires two elementary gates, up to a global phase: 

1 \ / -;• \ / 1 \ / / 



X = NOT = 



1 



-i 



-1 



-i 



(5) 



(6) 



2.2 Circuits for diagonal unitàries 

For a diagonal matrix D E U{4), we have D = diag(zi, 22,23,24) with zíZí = 1,/ = 1 . . .4. The coordinates or their 
product can be normalized by choosing the global phase. In contrast, the quantity ziZj 'z4 is invariant. 

Proposition 2.1 /) A diagonal matrix D = diag(zi, 12,13,14) in U (4) may be written as a tensor product of diag- 
onal elements ofU{2) iff ziZ2^Zj^Z4 — 1. ii) Any gate which is diagonal when written in the computation basis 
may be implemented up to phase infive elementary gates or less. 
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Figure 1: Any 4x4 diagonal unitary D — diag{zi,Z2,Z3,Z4) may be decomposed into up to five elementary gates. 
We set e^"^ — ziZ2^z^^Z4 and define W ~ diag(e"^/^,e^"''/^). The two one-qubit unitàries on the right are diagonal. 
Since the inverse of a diagonal matrix is also diagonal, the form of this circuit can be reversed for any given matrix. 



Proof: i) The forward implication follows from diag(r|i ,112) ^ diag(r|3,r|4) — diag(r|ir|3,r|ir|4,r|2Tl3,r|2Tl4). 
For the reverse implication, rewrite that as 

diag(e'«S e'-e^ ) diag(e'e3 , e'''^ ) = diag(e'(«i +83) ^ e'(«i +84) ^ e'(«2+e3) , 

If we are given the four diagonal entries zi ■■■Z4 and wish to find 9, this can be achieved by taking logarithms of 
Zic and solving the resulting linear system in terms of 9i , . . . , 84. The matrix of this 4x4 system is degenerate and 
has rank 3. However, the constraint ziZo^z^^Z4 = 1 ensures that the system has a unique solution. 

ii) Consider the computation of FigureQ] For a fixed D = diag(zi ,22,23,24), put ^"^ — ZiZ2 '24. Now note 
the leftmost three gates enact 

|00) K^e'*/4|00) 
|01)^e-'*/4|01> 
|10) K^e-'*/4|00) 
|11) H^e'*/4|00> 

Thus by Equation0and part one of the present proposition, the difference between D and the leftmost three gates 
is a pair of single elementary gates which are diagonal elements ofí/(l)©í/(l)oneach line. □ 

3 Matrix Decompositions and Prior Work 

As shown above, quantum circuits can be modelled by matrix formulas that decompose the overall computation 
(one large unitary matrix) into matrix products and tensor products of elementary gates (smaller unitary matrices). 
This suggests the use of matrix decomposition theorems from numerical analysis and Lie theory. Below, we revisit 
only decompositions relevant to our work: SVD, polar, symmetric Shur (spectral), QR 1 1 1 and KAK 1 11 1 . Additio- 
nally, (i) ablock-2 x 2 versionof the SVD called the C5 decomposition [10 pp. 77-79] was used for circuit synthesis 
in 1161 . and (ii) the Lí/ decomposition IIIOI was used to analyze CNOT-based circuits in Q. Most of those decom- 
positions can be computed with existing softare LAPACK, downloadable from |http : //wwwVnetlib . orq| 

3.1 Quantum circuit synthesis via the QR decomposition 

The unitary matrix of a quantum computation can be analogized with the truth table of a classical lògic circuit. 
Logic minimization aside, it is trivial to come up with a classical AND-OR-NOT circuit implementing a given 
truth table. Each Une of the truth table is implemented using AND and NOT gates, then all Unes are connected by 
OR gates. The algorithm proposed in (5| solves a quantum version of this task.^ 

The algorithm relies on the theorem from numerical analysis, saying that an arbitrary matrix can be decom- 
posed into a product of a unitary matrix Q and an upper triangular matrix R, not necessarily square |10|. We 

^We note that the work in (5) to a large extent rehes on results in QJ. 
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are going to apply this theorem to unitary matrices, which makes R diagonal. The canonical algorithm for QR- 
decomposition is similar to the classical triangulation by row subtractions in that it zeroes out matrix elements 
one by one. Since elementary row operations are typically not unitary, one instead applies a specially calculated 
element of í/(2) to a pair of rows so as to zero out a particular matrix element. Such matrices are known as 
Givens wtations and can be viewed as gates (not yet elementary) in a quantum circuit for Q. This suggests that 
we find a decomposition for the remaining diagonal component R. Circuits for diagonal matrices are not explicitly 
addressed in pl, but are the subject of the work in |9|. The 2-qubit case addressed in the previous subsection is 
sufficient for further developments below. 

Since each Givens rotation is a non-trivial two-qubit matrix, it should be further decomposed into elementary 
gates. In the genèric case, the algorithm from (5) entails one Givens rotation to nullify each matrix entry below 
the diagonal. Thus, a genèric 4x4 unitary representing a 2-qubit computation will decompose into six Givens 
rotations, each uniquely determined. The first rotation (G3.4 in |5|) is between the states |10) and |11) whose 
indices corresponds to the last two rows of the matrix. This rotation can be thought of as a genèric 1 -qubit rotation 
on the second qubit, controlled by the first qubit. The work in (OISl shows that such a controUed rotation topC — V 
can be implemented using eight elementary gates from the same gate library that we use. Namely, decompose V 
according to Equation|3]and use the parameters 5, a, 9 and p to define matrices 



e-'"/2 \ / cos(e/4) sin(e/4) 
e'"/2 ) \ -sin(e/4) cos(e/4) 



(8) 



/ cos(-e/4) sin(-e/4) \ / e'(«+P)/4 \ 

1^ -sin(-e/4) cos(-e/4) )\ e-'(«+P)/4 ) ^ ' 

C=( ,-„„-p„4) (10) 

-(i JO 

One can verify that ABC = I and ATBTC = V/det{V) = V. Therefore topC - V = (D (g) 1) o (1 (g) A) o 
topCNOTo [\®B)o topCNOTo (1 (g) C). This decomposition is illustrated in Figure|3]and implies 8 elementary 
gates because A and B require two each. 

The next Givens rotation (G2,3) is between states |01) and |10). It is not a controlled one-qubit rotation and 
thus more difficult to implement. The remaining Givens rotations are between |00) and |01) (Gi,2), 1 10) and 1 11) 
(G3.4), |01) and 1 10) (G2.3) as well as 1 10) and 1 1 1) (G3.4). Four out of six are one-qubit rotations controlled by 
the top line — the most significant qubit. 

To perform accurate gate counts in the 2-qubit case, we first observe that an arbitrary 2-qubit diagonal matrix 
can be implemented in five gates via Proposition l2.1l Of those five two are CNOTs. The remaining effort is to 
count gates in the six Givens rotations. Following 1 1 5 1, let topC-V be any V E U{2) controlled on the top line 
and acting on the second. Then viewing a 4 x 4 matrix as block-2 x 2, we obtain 

topC-y=^J 1^ ^ and (X®l)otopC-yo(X®l) = ^ Q (12) 

Observe that topC-V implements G3.4 and, according to Figure|3l costs eight gates, of which two are CNOTs. 
As shown in Equation|6] inverters cost two elementary gates each. Therefore the rotation Gi_2, implemented as 
above, costs twelve gates. 
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With being a two-column-high zero vector, the rotation G2,3 can be implemented as 

/ 1 0' \ 

botCNOTotopC-(XVX)obotCNOT= V 6 (13) 

V 0' 1 / 

The computation topC— (XVX) considered as topC— V takes eight elementary gates, and thus G2.Í can be imple- 
mented in ten elementary gates, of which four are CNOTs. 

In the genèric case, the algorithm from |5 1 is going to use three Gi,4 Givens rotations totalling 24 elementary 
gates of which 6 are CNOTs, two G2,3 Givens rotations totalling 20 elementary gates of which 8 are CNOTs and one 
Gi,2 Givens rotation which counts for 12 elementary gates including 2 CNOTs. Additionally, we use 5 elementary 
gates (of which 2 are CNOTs) to implement the diagonal R via Proposition l2.1l Thus, 61 gates will be required in 
the genèric (worst) case, and 18 ofthose will be CNOTi. 

3.2 Other matrix decomposítíons: SVD, polar, symmetric Shur (spectral) and KAK 

Golub and Van Loan 110| p. 73] define the Singular-Value Decomposition (SVD) for complex matrices as follows: 
Definition 3.1 If M e C'"^", then there exist unitary matrices U e C'"^'" and V G C"^" such that 

U*My = diag{oi , . . . , Op) e K""" p = min{m, n} 

where the a,- are singular vàlues and Oi > O2 > • • • > > 0. For real-valued M, U and V must be orthogonal. 

In this work we are only interested in the case m = n, moreover, n is typically a power of two. 

Definition 3.2 The polar decomposition of M is M = PZ, where Z is unitary and P is Hermitian. 

This can be derived from the SVD as follows flO" p. 149]. If M = UAV*, then M = {UAU*){UV*) = PZ. 
This decomposition is analogous to the factorization of complex numbers z — e'^s(-')|2| and intuitively similar 
to writing any complex n x n matrix as a sum of a Hermitian and skew-Hermitian matrices, in terms of matrix 
elements: m,y = (m,^ + m* ■) /2 + (m,j — m^,) /2. Skew-Hermitian matrices exponentiate to unitàries, and Hermitian 
matrices exponentiate to Hermitian. However, in general exp{XY) ^ exp{X) exp{Y) unless XY = YX, and polar 
decompositions cannot be computed by exponentiation. On the positive side, given an explícit M, P^ can be 
computed as MM*, and a possible P can be found via matrix squareroot. In our work, we need a more refined 
version of the polar decomposition known from Lie theory 1111 . The term unitary polar in the following definition 
is ours. 

Definition 3.3 The unitary polar decomposition of M £U{n) is M = PZ, where Z G SO{n) and P = P'. 
Since Z and M are unitary, so is P, demanding P^ = P. 

Definition 3.4 The symmetric Shur decomposition 1101 p. 393], also known as the spectral theorem to operator 
theorists, states that M = OAO' where M is a real-valued symmetric n x n-matrix, A is diagonal and O E SO{n). 
For a complex-valued Hermitian M, the matrix O will have to be in SU{n). 

The symmetric Shur (spectral) decomposition can be interpreted as choosing a basis in which M is diagonal. 
Since such a basis must consist of eigenvectors, the columns of O list eigenvectors of M in the initial basis and A 
lists eigenvalues in the corresponding order. 

Proposition 3.5 The following mild two-step generalization of the spectral theorem holds: 

1. y A,B, symmetric real nx n matrices withAB — BA, 3 O E SO{n) such that OAO' and OBCf are diagonal; 



1 



2. y P ÇzU{n) with P = P', 3 O SO(n) such that P = OAO' , where A is diagonal with norm-one entries. 
Proof: 

1 . It suffices to constmct a basis which is simultaneously a basis of eigenvectors for both A and B. Thus, say 
Vx is the X eigenspace of B. For v e V^, B{Av) —A{Bv) — XAv, i.e. Av preserves the eigenspace. Now 
find eigenvectors for A restricted to V\, which remains symmetric. 

2. Consider the real and imaginary parts ofP=A + iB. Now \^PP* =PP={A + iB) {A - iB) = {A- + B^) + 
i{BA—AB). Since the imaginary part of 1 is 0, we conclude that Afi = fiA. The result foUows from part 1. 

□ 

The unitary polar decomposition and Proposi tion l3 .5 I c an be combined to produce the following variant of the 
SVD for unitary matrices. Suppose U = PZ by the unitary polar decomposition. Apply Proposition 13 . 5 1 to P and 
write U = PZ = OA{0'Z) = VAW where V, W € 0{n). Now multiply the first column of V and the first entry of A 
by det(y), and then multiply the first row ofW and the first entry of A by det(W). Thus we obtain V, W G SO{n). 

Deflnition 3.6 The normalized unitary KAK decomposition of M G í/(n) is M = VAVK, where V, W G SO{n) and 
A G U{n) is diagonal. A related claim in terms of matrix groups is U (n) — KAK, where K = 0{n) and A is the 
group of diagonal unitary matrices of determinant one. 

The term Lie theory, in its modern use, refers to the mathematical theory of continuous matrix groups. Rather 
than study individual matrices, as is common in numerical analysis, Lie theory studies collective behavior of 
various types of matrices and often extends constructions from the group GL{n) to its continuous subgroups such 
as 0{n) and U (n). 

The KAK decomposition is a far-reaching generahzation of the SVD and dates back to the origins of Lie theory 
in the 1920s. Knapp II II p.580] attributes it to Cartan (4]. The KAK decomposition of a reductiva Lie group 
G entails G = KAK where K ïs a maximal proper compact subgroup and A is a torus. A torus is a connected 
Abelian group closed in G, and always a product of copies of the multiplicative group (0,°°) and U (1). The SVD 
decomposition can be seen as a special case with G — GL{n,C), K ~U{n) and the torus being the group of n x « 
diagonal matrices with positive real entries. In our work, we use another special case of the KAK decomposition 
with G — U («), K ~ 0{n) and the torus being the group of « x « diagonal unitary matrices. Section WA] shows a 
surprising interpretation of 0(4) in terms of one-qubit gates. 

4 The Entangler Gate 

The entangler gate maps the computational basis into the "màgic basis", which we introduce below. Together with 
its inverse — the disentangler — the entangler gate is useful for breaking down arbitrary two-qubit computations 
into elementary gates. With such uses in mind, we implement the entangler and disentangler by elementary gates. 

4.1 SU{2) ® SU{2) = 50(4) via the màgic basis 

The "màgic basis" II13I provides an elegant way of thinking about tensor products of one-qubit gates."* 

^Stated in terms of the Lie àlgebra of U (4), this involves the isomorphism u(2) © u(2) = 0(4) 1 11 p. 370]. 
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H 



Figure 2: Implementing E by elementary gates. Here S = diag(l,/) counts as one elementary gate and the 
Hadamard gate H counts as two. 



Definition 4.1 The màgic basis of phase shifted Bell states is given by 




(|00> + |11»/V2 
(/|00)-/|ll))/v^ 
(/|01)+/|10))/V2 
(|01)-|10))/V2 



(14) 



Note that each is maximally entangled, and the Arabic numbers are indices rather than energy states. 

Via a startling and omitted direct computation, the matrix coefficients of A fi (in the notation of Equation[3 
with respect to the màgic basis will all be real. Hence A (X) B is orthogonal. For example, 



(A®l)|ml) = (a|00) + p|10)-p|01)+a|ll))/V2 

= Rea| ml) + Ima| m2) - Imp | m3) - Rep | m4) 



(15) 



Since changing basis does not change determinant, these computations assert a (í/(4)-inner) Lie-group isomor- 
phism between SU{2) (g) SU{2) and S0{4). Importantly, both are known to be connected II II p. 68]. 

Theorem 4.2 (from L13I ) The màgic basis realizes the low dimensionat isomorphism between SU(2) <S)SU{2) 
and 5(9(4). Specifically, for V e t/(4) written with matrix coefficients relative to the màgic basis of Eauation ll4l 



[V e 50(4) C í/(4)]-^[(y : C[|0), |1)] ® C[|0), |1)] -> C) ^A®BforA,B e SU{2)] 



(16) 



Cf. Equation[2for the matrix for A(g)B in the computational basis. 

Proof: Continuing as in EauationíTsl consider all (A(g)l)|mi) and (lcg)B)|mj) to show that 5't/(2) (g)5't/(2) maps 
into S0{4). Now note SU (2) is three dimensional since |a |2 + |P|2 = l,so SU{2)(g)SU{2) is six dimensional. As 
50(4) has 3+2 + 1 real dimensions, this shows that the map defined above is onto the identity component. □ 

4.2 Definition and properties of E 

Definition 4.3 The entangler gate E is the two qubit gate which maps the computational basis into the màgic 
basis: |00) i-^ |ml), |01) i-^ |m2), |10) |m3), and |11) i-^ 1^4). The inverse gate E* is called the disentangler. 



In terms of the computational basis, E has the following matrix: 



E = 



/ 1 / \ 

\/2 / 1 

2 / -1 

V 1 -/ / 



(17) 
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Now recali generally that for g E GL{n), a linear map L with matrix A subordinate to some basis {vi , . . . v„} may 
also be expressed in terms of the basis {gvi gv„} via the conjugation map A i— > gAg^ ' . In particular, E is also 
given by the matrix above in the màgic basis, and likewise E*, and likewise any matrix commuting with E. The 
typical use of E is the following Corollary of Theorem l4.2l 



Corollary 4.4 Suppose V e S0{4), that is V € í/(4) with detiV) = 1 and all real entries. Then via the change of 
basis remark of the last pamgraph, EVE* is a tensor product ofone-line gates of the form of Equation^ 

One finds that E can be realized up to global phase by seven elementary gates, as shown in Figure|5] This is 
most easily verified by multiplying the appropriate 4x4 matrices. In particular, Equation|2]writes topCNOT and 
botCNOT as permutation matrices. With that in mind, one can explicitly verify that 



x/2 

: botCNOTo topCNOTo — 



f 1 10 0^ 
1-10 

11 

V 1 -1 / 



o botCNOTo 



^ 1 ^ 

10 

0/0 

\ / / 



obotCNOT (18) 



Note that the circuit diagram in Figure 121 travels right to left, so gate matrices are multiplied in reverse. S = 
diag(l,/) is an elementary gate up to global phase e^"'/^, and the Hadamard gate H can be implemented, up to 
global phase, using two elementary gates as shown in Equationls] 

In summary, E requires four CNOT gates and three one-qubit rotations. Similarly, E* may be implemented in 
seven elementary gates by writing the inverse of each gate of Figure|2in reverse order 

5 An Arbitrary Two-qubit Computation in 23 Elementary Gates or Less 

In order to implement an arbitrary two-qubit computation with elementary gates, we first compute the normalized 
unitary KAK decomposition U = K1AK2 of its unitary matrix U. According to the "màgic isomorphism" from 
Section 14.11 if we view A'l and K2 in the basis of Bell states, they decompose into tensor products of genèric 
one-qubit computations, each requiring up to three one-qubit elementary gates. However, we then must view the 
remaining diagonal matrix in the same basis as well. The remaining part is of the form EAE* for A diagonal and, 
as shown below, can be implemented in 1 1 gates due to its pattern of zero entries. 

5.1 Decomposition algorithm 

The matrix decomposition implied by Theorem ll.ll is derived below, and gate counts are in the next subsection. 

Proposition 5.1 Let U be the matrix far any two-qubit computation in the computational basis, so that EUE* 
represents U in the màgic basis. Then 

í/ = (t/l (g) t/2) o botCNGTo topC-í/3 o (1 (K) í/4) o botCNOTo (í/5 (g) í/g) (19) 

where í/i . . . í/g are one-qubit gates on each line and topC— í/3 is controlled by the top line. 

Proof: We are going to use the "canonical decomposition" of U (4), which is a combination of the KAK decom- 
position of í/(4) and the "màgic isomorphism" of Section ITT] The proof extends an algorithmic version of the 
canonical decomposition towards elementary gates for quantum computation Q] in the spirit of [5J. 
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In the algorithm below, steps 1-4 compute the normaüzed unitary KAK decomposition (see Definition l3.6t of 
a given 2-qubit quantum computation U . Step 5 applies the màgic isomorphism of Section ini to separate four 
genèric one-qubit gates. Step 6 implements the remaining computation. 



1. First, compute for E*UE = PK\ the unitary polar decomposition P — P', Ki e 5(9(4). To do so, note 



p2 ^ ppt ^ pK,K'P' = E*UEE'U'E. 



2. Now apply Proposition 13 .5 1 to P^. This produces P^ = K2DK2 ^ for K2 G 0(4), D diagonal. Furthermore, 
choose K2 € S0{4), so that EKjE* is a tensor product via CoroUarv 14.41 

3. Choose squareroots entrywise on the diagonal to form being careful to choose the signs of each root so 
that in the product det\/D = detU. This is in fact possible, since detP^ — (detí/)^. Having so chosen y/D, 
compute P = K2 VDK2 ^ ■ 

4. One can now compute Ki = p-^E*UE = PE*UE. As detP = detí/, in fact Ki e S0{4). 

5. Thus E*UE = PKi = K2Vd{K2^K\), whence 

U = {EK2E*){eVdE*){EK2^KiE*) 



upon conversion back to the computational basis. Using Corollarv l4.4l we define Ui ,1/2,1/5 and í/g by 

Ui®U2=EK2E* and Us®U(,^ EK^^K^E* (20) 
Both expressions may now be broken into explicit tensor products of elements oiU{\). 

6. What remains is to describe the implementation of E^/DE* . For this, label \/D ~ dmg{a,b,c,d) with 
complex entries from U{ï). Then 



eVde* = ^ 



/ a + b a-b \ 

c+d c-d 

c-d c+d 

\ a-b a + b ) 



(21) 



Multiplying by a botCNOT on the left flips rows two and four, while multiplying on the right flips columns 
two and four. Thus, 

E^DE* =botCNOTo ^ ^^obotCNDT (22) 

for some IJ\,B G í/(2). Choose í/3 so that í/3 — BU^^. Then the block-diagonal matrix í/4 ©fi may be 
implemented viaí/4®fi= (1 ® BÍ/4"^) o (1 (g) í/4) = (topC-í/3) o (1 (g) í/4). 

Note that this algorithm has several unspecified degrees of freedom that may affect gate counts for specific 2-qubit 
computations. Arbitrary choices can be made in ordering eigenvectors in stepIJland choosing a squareroot of a 
complex diagonal matrix in step|3] □ 
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Figure 3: The implementation of a controlled-V gate fP Figure 7]. The gates A,B,C and D are computed ibid. 
for a given V. Here, C and D require one elementary gate each, while A and B require two each. 
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Figure 4: The decomposition of a genèric 2-qubit quantum computation into up to 23 gates. Four genèric one- 
qubit rotations are marked with "3" because they require up to three elementary gates. Computations requiring 
two or one elementary gates are marked similarly. 



5.2 The overall gate decomposition and gate counts 



ProDosition IS. ll decomposes an arbitrary two-qubit unitary into U ~ (Ui (g) t/2) o botCNOT o topC— í/3 o (1 (g) í/4) o 
botCNOTo (í/5 (g) í/g) where í/i , . . . , í/g are one-qubit gates. The immediate gate count yields: 

• three elementary rotations for each of five one-qubit gates í/j , í/2 , í/3 , í/5 and í/g, 

• two bot CNOT gates, 

• eight elementary gates to implement the topC— í/4 gate, according to (5] Figure 7]. 

The total gate count of 25 can be further reduced, given the structure of the topC— V circuit in Figure|3l Indeed, 
that circuit can be written symbolically as topC-í/3 = (l(K)C)otopCNOTo (1 (g)fi) o topCNOTo (D(g)A). C and D 
are elementary gates up to phase, but A and B require up to two elementary gates 0. 

Since topC— í/3 is next to (1 (g) í/4) in Proposition l5.ll we can reduce (D (g) A) o (1 ® í/4) to (D (E) Uj) where 
í/7 = AU4. By merging the computation A with the genèric one-qubit computation í/4 that may require up to three 
elementary gates, one reduces the overall circuit by two elementary gates. 

The overall circuit decomposition can be described algebraically as follows: 

í/ = (í/i (g) í/2) o botCNOTo {D (g) í/7) o topCNOTo (1 (g) fi) o topCNOTo (1 (g C) o botCNOTo (í/5 g) í/g) (23) 

It is illustrated in Figure|4] where gate counts are shown as well. 

Our circuit decomposition requires at most four CNOTs, while other gates are elementary one-qubit rota- 
tions. Such a small number of non-one-qubit gates may be desired in practical implementations where multi-qubit 
interactions are more difficult to implement. 

It is understood that Figurel^and our gate counts refer to the worst case. Specific computations may require 
only some of those gates. In particular, the next section shows three examples that all require fewer gates than in 
the worst case. In those examples, our algorithm is able to capture the structure of the given quantum computation. 
Unlike previously known circuit synthesis algorithms, ours can always implement A(gfi withoutusing CNOT gates. 
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5.3 Examples 

Several examples below follow the algorithm from Theorem ll.il The order of eigenvectors and the choices of 
squareroots aimed at improving gate counts, but this search was not exhaustive. 

Example 5.2 Let Z/®// be the two dimensional Hadamard gate. FoUowing our algorithm, E* {H(E)H)E E S0{4-), 
so that p2 = 1 and we may choose Vd = P = 1 and íT? = 1- Then Ki = K^^Ki = E* {H ® H)E, and the algorithm 
implements H®H m four elementary one-qubit gates. The CNOTs cancel. Similar comments apply to any A®B. 

O 

Example 5.3 Let / : Z/2Z Z/2Z be the flip map, i.e. /(«) = « + 1. The Deutsch algorithm as described, 
e.g., in 1141 p. 30], calls for a black-box gate Uf with í//|x) \y) = \x) |3' + /(x)), so that here Uj swaps |00) ^ |01). 
Thus, Uf is easily implemented as Uf ~ {X®\)o topCNOTo {X®\)\n five gates. Below, we decompose Uf using 
our algorithm. 

First, we find the Hermitian part of the unitary polar decomposition of E*UfE. 



E*UfEE'U'fE = PP' =P^ = 



/ 1 \ 

10 

10 

\ 1 y 



(24) 



Now we must choose a basis of eigenvectors so as to diagonalize P^. Since P^ has both ±1 as double eigenvalues, 
there are uncountably many ways to do this. Simplifying things slightly, choose 



^ 2 



V 



1 
1 
1 

-1 



1^ 

-1 

1 
1 / 



so that í/i (g) í/2 = EK2E* = 



(8)1 



(25) 



Now the ordering of the column vectors of K2 forces the diagonal D ~ diag(— 1, 1,-1,1) with P^ = K2DK2^. 
We choose Vd = (/, 1,/, 1), being careful to ensure detVZ) = detí// = — 1.^ Now putting P = K2\ÍDK2^ , define 
K\ — PE*UfE. Then the one-line unitàries on the far side of the circuit may be computed as 



EKj^KiE* 



Jii/4 , 



/ / 1 1 \ 

1 / -/ 1 

-/ -1 1 -i 

\ -1 -/ -/ 1 / 



=.™/4. 



x/2 ( 1 



1 

V2 



1 

— ; 



1 



(26) 



To implement the latter in elementary matrices, one computes that 





/ 1 

-/ 1 







while for the second factor similarly 



1 -i 
-i 1 



grà/4 





jTl/4 





-171/ 



1 1 

-1 1 



1 1 

-1 1 



gm/4 

















(27) 



(28) 



^Failing to do so will cause detífi ^ 1 eventually, at which point EK2 ^KiE* is not a tensor product of one-qubit computations. 
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FigureS: Diagrams for the í// black box for Deutsch's algorithm, where / : Z/2Z ^ Z/2Z is /(x) —x+\. The 
typical implementation is shown at left, counting for 2+1+2 = 5 gates and one CNOT. One result of the current 
algorithm is shown at right. Here, U\ = Ry{-Tí), U4 = e''''^'^Rz{-%/2)Ry{%)R,{%/2), U5 = iRy{%)R,{~n/2), and 
í/g = Rz{—n/2)Ry{n)Rz{K/2). Thus this instance of the algorithm produces 10 gates with two CNOTs. 



FinaUy, in this example the conditioned element is not required to implement E\/DE* . Indeed, 



botCNOToEVSfi* obotCNOT = e"'"/"* 



/ 1 i \ 

\/2 í 1 

2 1 / 

\ í 1 / 



x/2 



1 i 
i 1 



(29) 



Thus, no conditioned gate is required within E\/DE* . Moreover, as we recently described the decomposition of 
the complex conjugate, we see the 1 ® í/4 factor above counts for three gates. Hence, our algorithm in this instance 
produces a decomposition with 1 1 rather than 5 gates. It holds two CNOTs rather than one CNOT. O 

Example 5.4 One case of the algorithm also produces a 1 4-gate decomposition of the quantum Fourier transform 
f , in contrast to the usual 12-gate implementation. It has four rather than five two-qubit elementary gates (CNOTs.) 
Specifically, we write 1 00) , . . . 1 1 1 ) as 1 0) , . . . , 1 3 ) . Then the discrete Fourier transform f is given by 

/ 1 

^y^lk) so that ^ = ^ 



■ k=0 



1 



1 i -1 

1 -1 1 

\ 1 -i -1 



1 \ 

—i 

-1 

i 



(30) 



Thus, the square of the Hermitian part ofE*fE is 
E*fEE'f'E = PP'=P^ = 



V e 
















e™/4 


g3í7t/4 








g3!Jt/4 


gm/4 





-í7C/4 








gi7C/4 



(31) 



Now we must diagonalize P^. As the eigenvalues are 1 with multiplicity two and / with multiplicity two, there are 
infinitely many possible eigen-bases of C^. Choosing one such for the columns of K2 with determinant 1, say 



V2 

K2 = — 

2 



/ 1 1 \ 

1-10 

110 

\ -i 1 y 



/2 

so that í/i (g) í/2 = EK2E* = 



1 -1 
1 1 



(32) 



Now the ordering of the column vectors of K2 forces the diagonal D = diag(í,í, 1, — 1) with = K2DK2 ^ The 
next step is to choose y/D so that det\/D = detEfE* = detj = -i. Our choiceis Vd = diag(e™/'*,e'"/'*,l,-l). 
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Figure 6: Shown are circuits for the Fourier transform: Standard (left) and produced by our algorithm (right). 

f/i = Ry{2>%), í/3 = e-'^'I^X, U5 = TH = e-^''^^R,{n/4-n)Ryin), and U(, = -T = {-\)é''I^R^{%/A). Counting 
the conditioned í/3 as seven gates, we get 2+1 + 1+ 7 + 1+ 2=14 gates total. 

Thtn P = K2\/DK2^ , so that/Ti = PE* T E is complicated''. On the otherhand, 



( 



V 



.-Í7C/4 



VI 

2 







2 



.-Í7C/4 



.-Í7C/4 



\ 

_V2 





2 ^ 



2 

I 

í7l/4 



(33) 



Thus, with some more matrix computations one computes that on the other side 

U5®U(,=EK2^KiE* = [diag(l,e'''/'^)o//](g)diag(e"'''/^-l) 



(34) 



Note the first tensor factor would be more commonly referred to as T o H = e''^^^R^{%/4){—i)R^{—%)Ry{%) = 
e-^'^/^R^{K/4 - K)Ry{n). On the other hand, more commonly U(, = -T = (-l)e™/**7;,(7t/4), so that Us^Ue 
counts for 2 + 1=3 gates. This concludes the derivation of the outside one-line unitàries. 

Finally, we implement eVdE*. The spacing of the zeroes in eVdE* causes botCNOTo£'\/D£'* o botCNOT 
to be block diagonal, specifically (in 2 x 2 blocks) 



diag(e™/4,e™/4) 






X 



botCNOTo£■^/D£'*obotCNOT : 
Thus í/4 = diag(e™/'*,e"^/'*) which is = 1 up to phase and does not cost any gates. For í/3, 
e-'V4x=( - „ 11 



(35) 



(36) 



-^"'/^ \ W : 

e-3™/4 y 1^ -1 y V 

Thus, in the notation from the Section|2](and from (ï]|5l), 5 = — 37i/4, a = 0, 9 = 7i, and p = 7t. Therefore the 
conditioned e^^'^/^X may be realized in 7 gates. As the unitàries í/i , í/2 = 1, í/5 and í/e (see Figure|6tb) ) together 
require 5 gates, we have 14 gates total, of which 2 are botCNOTs and 2 are topCNOTs. 

Compai-e the above to the Standard J = botCNDTotopCNDTobotCNDT(l(g)//) o (botC-5) o{H®l) illus- 
trated in Figure|6la). The conditioned S can be implemented in 5 gates as shown in Figure|3] Thus, the Standard 
circuit for the two-qubit Fourier transform has 12 elementary gates. While this circuit has two gates fewer than 
the circuit produced by our algorithm, it contains 5 rather than 4 CNOT gates. Since multi-qubit interactions are 
relatively expensive in many quantum implementation technologies, the choice between the two circuits may de- 
pend on specific technology parameters and implementation objectives. O 



^Moreover, we had to carefully choose detV5 = det?^ to ensure detA"! = 1. Otherwise det/íj '/íi ^ 1 sothatSíTj '^KiE* <jfU(l)®U{2). 
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6 Gate Counts Versus Degrees of Freedom: Lower and Upper Bounds 



We have constructively shown in the previous section that any two-qubit quantum computation can be imple- 
mented in 23 elementary gates or less, of which at most 4 ara CNOTs and remaining gates are one-qubit rotations. 
As we do not know if this result can be improved, we show that at least 17 elementary gates are required. 

Theorem 6.1 There exists a two-qubit computation such that any circuit implementing it in tems of elementary 
gates consists of at least 17 gates. In particular, 15 one-qubit rotations are required and two CNOTs. 

Proof: First, recali that two-qubit quantum computations can be represented by 4 x 4-unitary matrices, and such 
matrices can be normalized to have determinant one because quantum measurement is not affected by global 
phase. Also recali that we use two types of elementary gates: (1) one-qubit rotations with one real parameter each, 
and (2) CNOTs which operate on two qubits and are fully specified (no parameters). 

Let us now consider the set Qc of quantum computations that can be performed by some given two-qubit 
circuit C with fixed topology, where the parameters of one-qubit rotations are allowed to vary. Fixed circuit 
topology means that [the graph of] connections between elementary gates cannot be changed. Since the overaU 
unitary matrix can be expressed in terms of products and tensor products of the matrices of elementary gates, each 
matrix element is an infinitely differentiable function of the parameters of one-qubit rotations (more precisely, it is 
an algebraic function of sin and cos of those parameters). In other words, the set Qc is parameterized by one-qubit 
rotations and has the local structure of a differentiable manifold, whose topological dimension in GL(4) is the 
number of one-qubit rotations in C with variable parameters. The topological dimension is roughly-speaking the 
number of degrees of freedom. 

Since every computation can be implemented by a limited number of elementary gates, the set of possible 
circuit topologies is finite. The set of all implementable quantum computations is a union of sets Qc over the finite 
set of possible circuit topologies. Its topological dimension is the maximum of topological dimensions of Qc, i.e., 
the maximum number of one-qubit rotations with varying parameters, allowed in one circuit. 

On the other hand, UQc = SU (4). We compute its topological dimension as foUows. First, we point out that 
the matrix logarithm (which is infinitely differentiable) maps U (4) one-to-one onto the set of skew-synmietric Her- 
mitian matrices: UU* = 1 => log(í/) -\-\og{U*) = log(í/) -h (log(í/))* = 0. Furthermore, 4x4 skew-Hermitian 
matrices have 4 independent reals on the diagonal and are otherwise completely determined by their 6 complex 
upper-diagonal elements. Thus, the set of skew-Hermitian matrices has topological dimension 16, and the same is 
true about U{4). Subtracting 1 for global phase, we see that 15 one-qubit rotations are needed to implement some 
two-qubit computations. A randomly chosen computation is such with probabihty 1, i.e., almost always rather 
than always. 

If no CNOT gates are used in a given two-qubit circuit, the two lines never interact, and the two independent 
one-qubit computations can be implemented in 3 elementary rotations each. Therefore, two-qubit computations 
implementable without CNOTs have oiüy 6 degrees of freedom. Similarly, if only one CNOT is allowed, then only 
4x3 = 12 rotations can be placed on two Unes to the left and to the right of the CNOT to avoid gate reductions. This 
proves that at least 2 CNOT gates are necessary to implement any two-qubit computation requiring 15 rotations. □ 
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Figure 7: The overall structure entailed by our circuit decomposition. Four genèric one-qubit rotations are marked 
with "3" because they are worth up to three elementary gates. Two Hadamard gates are marked with "2" because 
they are worth two elementary gates. Constant gates are in bold. 

Given the lower bound in Theorem l6.ll the 19 non-constant one-qubit rotations in Figurel^seem redundant as 
only 15 rotations are required for dimension reasons. To this end, we offer another genèric gate decomposition for 
arbitrary 2-qubit computations that entails no more than 15 non-constant one-qubit rotations, at the price of some 
constant rotations and significantly more CNOT gates than used by our main decomposition in Figure0] 

Recali from Proposition 15. II that an arbitrary two-qubit unitary can be decomposed into U — {U\®U2)° 
{EDE*) o (í/3 (g) U4) where Ui,... ,1/4 are one-qubit gates and D is a diagonal unitary. In this context, we use 
circuit decompositions for E, E* and D given in Sections|2land|3 The matrix D is controlled by 3 real parameters 
(4 diagonal unitàries modulo global phase). It is implemented in Figure^using 3 one-qubit rotations and 2 CNOTs. 
The entangler E and disentangler E* are fixed matrices and require no parameters. The implementation of E in 
Figure IJlrequires 3 constant rotations and 4 CNOTs. 

Adding up gate counts, we see that í/i , . . . , í/4 may require up to 12 elementary gates alltogether. D counts for 
5, while E and E* count for 7 each, for a total of 31. However, upon inspection of the Figures [ï]and|2l one notes 
that the circuit EDE* has two canceling botCNOT gates. Moreover, since the inverse of D is, too, a diagonal 
unitary matrix, we can "flip" the asymmetric circuit for D in FigureQ This allows us to merge a constant rotations 
from E with a variable rotation from D. The resulting circuit decomposition is illustrated in Figure0and requires 
up to 28 elementary gates total, of which 15 are variable one-qubit rotations, 5 are constant rotations and 8 are 
CNOTs. The slight asymmetry in Figure0is explained by the asymmetric circuit for D in Figure^ 

The following is a summary of our upper and lower bounds for worst-case optimal 2-qubit circuits: 

(a) an upper bound of 23 elementary gates; 

(b) a lower bound of 17 elementary gates. 

(c) an upper bound of 4 CNOT gates; 

(d) a lower bound of 2 CNOT gates; 

(e) an upper bound of 19 one-qubit rotations; 

(f) an upper bound of 15 variable elementary rotations; 

(g) a lower bound of 15 variable elementary rotations; 

In our on-going work we show that three CNOT gates are necessary and that the resulting lower bound of 18 
elementary gates is tight. The implied decomposition contains at most 15 elementary rotations. 
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7 Conclusions and On-going Work 



It is a well-known result that any one-qubit computation can be implemented using three rotations or less Ull . 
Our work answers a similar question about arbitrary two-qubit computations assuming that CNOT gates can be 
used in addition to single-qubit rotations, without ancilla qubits. First, we show a lower bound that calls for at 
least seventeen elementary gates: fifteen rotations and two CNOTs. We then constructively prové that twenty three 
elementary gates suffice to implement an arbitrary two-qubit computation. At most /oMr of those are CNOTs and 
the rest are single-qubit gates. In comparison, a previously known construction 0] 13 implies sixty-one gates of 
which eighteen are CNOTs. While this construction is more general than ours, for two-qubit computations, our 
algorithm generates far fewer gates in the worst (genèric) case. The savings in the number of multi-qubit gates 
(CNOTs) are particularly dramàtic. 

In terms of techniques for the synthesis of quantum circuits, our work emphasizes the following general ideas: 

• changing the computational basis to maximally-entangled states by applying specially-designed gates with 
the purpose of recognizing quantum computations implementable with one-qubit gates only; 

• systematic use of matrix decompositions from numerical analysis and Lie theory: polar, spectral and KAK; 

• focus on matrix decompositions that are intrinsic to unitary matrices, e.g., KAK of U (4), and include múlti- 
ple non-trivial unitary factors; 

• incremental reduction of existing quantum circuits by local optimization; exploiting degrees of freedom in 
circuit synthesis may be useful to expose additional reductions. 

Specifically, we formalize the "canonical decomposition" of two-qubit computations I13II12I as an instance of 
the KAK decomposition from Lie theory II II for í/(4) with K = 0(4) and A diagonal. We propose an algorithm 
to compute the KAK components and observe that elements of (9(4) can be interpreted in the "màgic basis" as 
pairs of one-qubit unitàries. Therefore, we change basis for all related matrices and further decompose them into 
elementary gates for quantum computation. 

In our on-going work, with additional techniques, we are able to improve the lower bound to 18 elementary 
gates and show that it is tight. 

We are also attempting to extend these ideas to three qubits or more. Two obstacles arise immediately: 

• Entanglement for three qubits is far more complicated than it is for two qubits In particular, no known 
"màgic basis" makes local unitàries tractable, and there are distinct notions of maximally-entangled states. 

• The use of the KAK decomposition does not automatically generalize beyond two qubits because K cU{2") 
must be a sufficiently large subgroup, in the sense that U{2")/K must be a Riemannian symmetric space 
II 1II12I . Although both 0(4) and U{2) x U (2) are large subgroups of í/(4), the set of local unitary gates 
(8'"^jí/(2) is not large enough in U (2") for n > 3. In particular, one does not expect a decomposition of the 
type í/i = U2DUi for í/i G í/(8), D diagonal, and í/2, t/3 € í/(2) (E)U{2)® U{2). 

With little hope for a direct matrix decomposition involving local unitàries, it remains possible, in principle, 
to construct a multi-step recursive decomposition. A related example is available in ÍTól . 
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